AITopics | path-normalized optimization

Path-Normalized Optimization of Recurrent Neural Networks with ReLU Activations

Neural Information Processing SystemsNov-21-2025, 14:51:55 GMT

We investigate the parameter-space geometry of recurrent neural networks (RNNs), and develop an adaptation of path-SGD optimization method, attuned to this geometry, that can learn plain RNNs with ReLU activations. On several datasets that require capturing long-term dependency structure, we show that path-SGD can significantly improve trainability of ReLU RNNs compared to RNNs trained with SGD, even with various recently suggested initialization schemes.

name change, path-normalized optimization, recurrent neural network, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.75)

Add feedback

Path-SGD: Path-Normalized Optimization in Deep Neural Networks

Neural Information Processing SystemsAug-12-2025, 23:43:54 GMT

We revisit the choice of SGD for training deep neural networks by reconsidering the appropriate geometry in which to optimize the weights. We argue for a geometry invariant to rescaling of weights that does not affect the output of the network, and suggest Path-SGD, which is an approximate steepest descent method with respect to a path-wise regularizer related to max-norm regularization. Path-SGD is easy and efficient to implement and leads to empirical gains over SGD and AdaGrad.

name change, path-normalized optimization, path-sgd, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Reviews: Path-Normalized Optimization of Recurrent Neural Networks with ReLU Activations

Neural Information Processing SystemsJan-20-2025, 13:24:53 GMT

This seems to be a worthwhile goal (since plain RNNs are computationally cheaper and easier to analyze theoretically) and their experiments show some promising results in improving performance over plain RNNs trained with existing optimization methods. However, it is not clear to me how the method that the authors use in practice differs significantly from regular Path-SGD introduced in previous work. The authors do present an adaptation of Path-SGD to networks with shared weights, and show that the new rescaling term applied to the gradients can be divided into two terms k1 and k2. But then, they note that the second term, which accounts for interactions between shared weights along the same path, is expensive to calculate for RNNs and show some empirical evidence that including it does not help performance. In the rest of the experiments, they ignore the second term, which to my understanding is essentially what makes the method introduced here different from regular Path-SGD.

experiment, path-normalized optimization, second term, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.65)

Add feedback

Path-Normalized Optimization of Recurrent Neural Networks with ReLU Activations

Neyshabur, Behnam, Wu, Yuhuai, Salakhutdinov, Russ R., Srebro, Nati

Neural Information Processing SystemsFeb-14-2020, 13:58:26 GMT

We investigate the parameter-space geometry of recurrent neural networks (RNNs), and develop an adaptation of path-SGD optimization method, attuned to this geometry, that can learn plain RNNs with ReLU activations. On several datasets that require capturing long-term dependency structure, we show that path-SGD can significantly improve trainability of ReLU RNNs compared to RNNs trained with SGD, even with various recently suggested initialization schemes. Papers published at the Neural Information Processing Systems Conference.

path-normalized optimization, recurrent neural network, relu activation, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Path-SGD: Path-Normalized Optimization in Deep Neural Networks

Neyshabur, Behnam, Salakhutdinov, Russ R., Srebro, Nati

Neural Information Processing SystemsFeb-14-2020, 11:42:42 GMT

We revisit the choice of SGD for training deep neural networks by reconsidering the appropriate geometry in which to optimize the weights. We argue for a geometry invariant to rescaling of weights that does not affect the output of the network, and suggest Path-SGD, which is an approximate steepest descent method with respect to a path-wise regularizer related to max-norm regularization. Path-SGD is easy and efficient to implement and leads to empirical gains over SGD and AdaGrad. Papers published at the Neural Information Processing Systems Conference.

deep neural network, path-normalized optimization, path-sgd

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Filters

Collaborating Authors

path-normalized optimization

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Path-Normalized Optimization of Recurrent Neural Networks with ReLU Activations

Path-SGD: Path-Normalized Optimization in Deep Neural Networks

Reviews: Path-Normalized Optimization of Recurrent Neural Networks with ReLU Activations

Path-Normalized Optimization of Recurrent Neural Networks with ReLU Activations

Path-SGD: Path-Normalized Optimization in Deep Neural Networks